Abstract

In this report, we primarily analyze the WHO COVID-19 and Google Mobility Trends datasets. For the Introduction, we introduce our primary and secondary question of interest and introduce our three data sources. Next, we provide a brief overview of the coronavirus and examine certain countries who exhibit strong and weak COVID restrictions. The next section provides summary statistics for each country’s number of cases, number of deaths, and case-mortality rate relative to their date in time. For Inferential Analysis, we built three linear regression models where we regressed number of cases, number of deaths, and case mortality rate on four factors from the Google Mobility Data. After that, we conducted a sensitivity analysis to determine if our models satisfied the assumptions made in our report. Finally, we stated our conclusion and examined our findings for Model 1, Model 2, and Model 3.


Introduction

In December of 2019, the occurrence of the coronavirus disease was first observed in Wuhan, China. The disease quickly spread throughout the world, contributing to the world wide lockdowns that kept many indoors for months. This spread was exacerbated by the presence of asymptomatic infected persons who accounted for approximately 40% to 45% of those infected (Oran, 2020). Those with symptoms mostly suffered from fever, coughs, fatigue, diarrhea, sore throat, dyspnea, and anosmia with more extreme cases resulting in blood clots, heart attacks, kidney failure, and severe respiratory failure (Larsen, 2020). Those with weakened immune systems or underlying health conditions, such as obesity, diabetes, and hypertension, are more susceptible to the harsher symptoms of the coronavirus and are at higher risk of mortality(Rezende, 2020).

As an immediate response to the pandemic, scientists sought to create an immediate vaccine for the virus. To further combat the spread and increasing deaths caused by the virus, governments worldwide closed off tourism and changed policies regarding being in public, resulting in limited mobility within each individual country as the majority of populations was indoors and usually went to public places for essential activities. For example, countries like China and the United States put strict nationally mandated directives to limit travel and social and shelter in place policies to close non essential businesses and schools, helping mitigate the spread of the disease. In the observational study by Nouvellet(2021), the amount of COVID-19 transmission following the initial reduction in mobility of 52 countries were analyzed and it was found that transmission decreased significantly at a rate of 73%; however, they also found that after quarantine restrictions eased, the transmission rates went up with the predictive ability of the transmissibility to have decreased. Geographical and socioeconomic factors also account for the differences in the spread of COVID-19 where a reduction in mobility has shown to be more difficult for disadvantaged racial and socioeconomic groups who aren’t able to reduce their mobility as sharply and visit areas associated with higher risk (Chang, 2021). Categories of places in which people are mobile also contribute to the spread of the virus differently as in India where certain places like grocery stores and retail stores associate with the incidence of COVID-19 a lot higher than other areas of mobility (Praharaj, 2020). It can be seen that mobility trends relate to the spread of COVID-19 in a multitude of ways and through analysis, the trend for certain mobility categories of places may help identify whether or not the virus may spread faster or slower at different locations of mobility.

In this project, we look into the implications of how the mobility of human populations across the globe can influence the number of deaths caused by the COVID-19 virus. In particular, we ask the question “Does the amount of mobility (or mobility index) affect the overall death count/number of cases/capita, number of deaths/capita on a global scale?” We also ask our secondary question of “Which mobility categories have the greatest impact on the overall death count/number of cases/capita and number of deaths/capita for overall countries?” We ask these questions as we would like to see how mobility in different places can influence the spread of COVID-19. As people access the different mobility category places through different manners at different rates, there can potentially be different patterns of infection that causes COVID-19 cases and even deaths. We believe that by analyzing the relationship between mobility categories and COVID-19 deaths, we can figure out which mobility category can potentially affect the spread of the virus the most and ultimately we can utilize this data to help with government policies to limit the spread of COVID-19 by reducing the access of these locations. We define mobility categories as six categories that people have access to before and after the spread to COVID-19. These categories are defined by the Google COVID-19 Community Mobility Reports dataset as Grocery and Pharmacy, Parks, Transit Stations, Retail and Recreation, Residential, and Workplaces. To address our question of interest, we will be using this dataset to quantify our amount of mobility and we will also use the WHO COVID-19 dataset to quantify the amount of deaths which can generate cases per capita and deaths per capita when added to the World Bank Dataset for global population sizes. We will be looking at the 175 countries contained in the 3 datasets. We will be using months as our unit of measurement to standardize the data for Google Mobility and WHO COVID-19 to control for the rate of spread for one’s initial infection, the incubation period and potentially one with asymptomatic symptoms. As the incubation period can be up to 3 weeks, we believe that using monthly data will help us control for these variables(Badr, 2020). The data for our analysis will start from the beginning of January and end at the end of February. Through our analysis, we can then see whether the amount of mobility has an affect on the spread of COVID-19 which will be indicated through death count/ capita and death count/number of cases/capita.


Background

The coronavirus pandemic has ravaged the globe for almost a full year. Scientists point towards late 2019, when the virus, later labeled as SARS-CoV-2, was discovered in Wuhan China. Since then, countries have independently responded to the outbreak. Many countries immediately enacted strict travel and lockdown orders, while others enforced measures slowly over time. To this day, countries still have different responses to the novel virus.

Argentina is a country who immediately responded the coronavirus outbreak. On March 20th, the nation, was already under intense financial distress, went into a lockdown. This lockdown required home quarantining, shutting down businesses and schools, and restricting travel in and out of the country. The lockdown’s strict enforcement was met with violence from the police, who were given authority to stop and question anyone they found roaming the streets. Argentina ended their official lockdown on November 6th, but still remain one of the strictest countries in terms COVID prevention measures.

Afghanistan is another country who minimally responded to COVID concerns. According to a 2020 New York Times Article titled “Covid Can’t Compete…,” many Afghans believe COVID is a hoax and “a lie told by the government.” With Afghanistan mainly concerned with war and killings within their borders, their local news outlets sparingly mention the coronavirus outbreak. The overall fear amongst the country is that COVID is nothing more than the regular flu. Afghanistan’s government slowly implemented city lockdowns. Now, prevention measures such as wearing masks and social distancing continue to be followed haphazardly. Afghans continue to “cram into buses, each shoulder to shoulder indoors in restaurants, and cluster together in sprawling bazaars.” These are explanations for why an estimated 32% of Afghanistan’s population have contracted COVID.

For this project, we examined different variables from Google’s Mobility Dataset. Google provides data for the changes in people’s travel since the start of COVID. The dataset records country-wide data on the different reasons for people leaving their homes. We primarily focused on the following three reasons: Grocery and Pharmacy, Parks, and Retail and Recreation. The database lists the percentage changes for each country in relation to their baseline value. The baseline value is essentially the normal value for that day of the week taken from the median of a 5-week period. Essentially, if a country exhibits a negative percentage, that means LESS individuals went out to that particular location compared to their 5-week expected value. Likewise, a positive percentage indicates individuals went out MORE than their 5-week expected value.

In addition to Google’s Mobility dataset, we also examined the World Health Organization’s COVID-19 dataset. WHO records cases and deaths for countries across the globe, spanning daily from 1/3/20 to 3/3/21. Relevant variables captured in this dataset include: “Date_reported”, “Country”, “New_cases”, “Cumulative_cases”, “New_deaths”, and “Cumulative_deaths.” Cumulative cases and cumulative deaths represent the sum of all previous and new cases or deaths to that specific point in time. In addition, we created a new variable “case_mortality_rate” which represents the proportion of individuals who die after contracting COVID.

Finally, we merged Google’s Mobility data with the WHO COVID-19 data. We also added “The World Bank” dataset on 2020 population size for each country. By doing so, we were able to generate new variables such as “cases_per_capita” and “death_per_capita.” Cases per capita was a useful variable because it allowed us to examine how severe the number of cases were relative to other countries.


Descriptive analysis

covidmobility <- read_csv("~/Downloads/Global_Mobility_Report.csv") %>%
    filter(is.na(sub_region_1)) %>%
    mutate(Date_reported = as.Date(date), Country_code = country_region_code) %>%
    select(-c("sub_region_1",
            "sub_region_2",
            "metro_area",
            "iso_3166_2_code",
            "census_fips_code"))

covidWHO <- read_csv("https://covid19.who.int/WHO-COVID-19-global-data.csv") %>%
    mutate(Date_reported = as.Date(Date_reported))

world_pop = read_csv("~/STA_141A_Project_2/World_Pop_Data.csv") %>%
    mutate(Country = name)

Cases by Country Data

cases_by_country = covidWHO %>%  mutate(Year = substr(Date_reported,1,4),
                                 Month = substr(Date_reported,6,7))  %>% 
  na.omit() %>%
  group_by(Country_code,Country) %>% 
  summarise(
        Country_code = unique(Country_code),
           Country = unique(Country),
           WHO_region = unique(WHO_region),
           Cumulative_cases = max(Cumulative_cases),
           Cumulative_deaths = max(Cumulative_deaths),
           case_mortality_rate = Cumulative_deaths/Cumulative_cases) %>%
  mutate("Continent" = countrycode(Country,"country.name","continent") ) %>%
  inner_join(world_pop, by  = "Country" ) %>% 
  mutate(cases_per_capita_2020 = Cumulative_cases/pop2020,
         deaths_per_capita_2020 = Cumulative_deaths/pop2020)

cases_by_country
covid_data = inner_join(covidmobility,covidWHO,by = c("Country_code","Date_reported")) %>% 
  select(c(
           "Country_code",
           "Country",
           "Date_reported",
           "retail_and_recreation_percent_change_from_baseline",   
           "grocery_and_pharmacy_percent_change_from_baseline",
           "parks_percent_change_from_baseline",
           "transit_stations_percent_change_from_baseline",  
           "workplaces_percent_change_from_baseline",
           "residential_percent_change_from_baseline",
           "WHO_region",
           "New_cases",
           "Cumulative_cases",
           "New_deaths",
           "Cumulative_deaths")) %>%
  mutate(Year = substr(Date_reported,1,4),
         Month = substr(Date_reported,6,7)) 

Population Size and Mobility Data by Country

country_data = covid_data %>% group_by(Country) %>% na.omit() %>% dplyr::summarise(Number_Cases = max(Cumulative_cases),
            Number_Deaths = max(Cumulative_deaths),
            case_mortality_rate = Number_Deaths/Number_Cases,
            Retail_Rec_change = mean(retail_and_recreation_percent_change_from_baseline),
            Grocery_Pharm_change = mean(grocery_and_pharmacy_percent_change_from_baseline),
            Parks_change = mean(parks_percent_change_from_baseline),
            Transit_Stations_change = mean(transit_stations_percent_change_from_baseline),
            Work_change = mean(workplaces_percent_change_from_baseline),
            Resid_change = mean(residential_percent_change_from_baseline))


country_data = inner_join(country_data, world_pop, by  = "Country" ) %>% 
  mutate(cases_per_capita_2020 = Number_Cases/pop2021,
         deaths_per_capita_2020 = Number_Deaths/pop2021)
country_data %>% select(-c(Number_Cases,Number_Deaths,case_mortality_rate))
covid_month_data_20 = covid_data %>% 
  na.omit() %>%
  #filter(Year == "2020") %>%
  group_by(Country_code,Country,Month,Year) %>% 
  summarise(
        Country_code = unique(Country_code),
           Country = unique(Country),
           retail_and_recreation_percent_change_from_baseline = mean(retail_and_recreation_percent_change_from_baseline),   
           grocery_and_pharmacy_percent_change_from_baseline =mean(grocery_and_pharmacy_percent_change_from_baseline) ,
           parks_percent_change_from_baseline = mean(parks_percent_change_from_baseline),
           transit_stations_percent_change_from_baseline = mean(transit_stations_percent_change_from_baseline),  
           workplaces_percent_change_from_baseline = mean(workplaces_percent_change_from_baseline),
           residential_percent_change_from_baseline = mean(residential_percent_change_from_baseline),
           WHO_region = unique(WHO_region),
           New_cases = sum(New_cases),
           Cumulative_cases = max(Cumulative_cases),
           New_deaths = sum(New_deaths),
           Cumulative_deaths = max(Cumulative_deaths) ) 
#covid_month_data_20
covid_month_data_20<- covid_month_data_20 %>% filter(Year == "2020")
#head(covid_month_data_20)

cd <- covid_month_data_20 %>% select(Country, Month, Country_code,
                                     Cumulative_deaths, Cumulative_cases, New_cases, New_deaths,
                                     ret = retail_and_recreation_percent_change_from_baseline, 
                                     gro = grocery_and_pharmacy_percent_change_from_baseline,
                                     par = parks_percent_change_from_baseline, 
                                     tra = transit_stations_percent_change_from_baseline, 
                                     wor = workplaces_percent_change_from_baseline, 
                                     res = residential_percent_change_from_baseline
                                     )
cd["Country_code"] <-   countrycode(cd$Country_code, "iso2c", "iso3c")
cdt <- cd %>% convert(num(Month))

Continent Sums

continent_sum = function(variable_of_interest,abbreviation,x_axis_label,graph_type = "boxplot"){

sum_stats = c("Avg" ,"SD"  , "Min" , "Q1"  , "Med" , "Q3"  ,"Max" )
column_sum_headers = paste(sum_stats,"_",abbreviation,sep="")

mortality_rate_sum = cases_by_country %>%
    group_by(Continent) %>%     #grouping by class type
  na.omit() %>%
    summarise(      Avg   = mean(get(variable_of_interest)),         #mean
                              SD    = sd(get(variable_of_interest)),           #standard deviation
                            Min   = min(get(variable_of_interest)),          #min
                            Q1    = quantile(get(variable_of_interest),.25), #quartile 1
                            Med   = median(get(variable_of_interest)),       #median
                            Q3    = quantile(get(variable_of_interest),.75), #quartile 3
                            Max   = max(get(variable_of_interest)))          #Max

names(mortality_rate_sum) = c("Continent",column_sum_headers)
graph = cases_by_country %>% filter(!is.na(Continent)) %>%
  ggplot( aes(x=get(variable_of_interest),fill = Continent)) +
    #geom_boxplot(position = "identity") + #density plot
    labs(fill = "Continent") + #color based on class type
    xlab(x_axis_label) + #x axis label
    ylab("Density") + #y axis label
    facet_grid(Continent~.) + #putting each class type in a different graph
    ggtitle(paste(x_axis_label, "by Global Continent")) + #adding average of each plot
    theme(plot.title = element_text(face = "bold",
                                                                    hjust = .5)
           #axis.text.y = element_blank()
       )
if(graph_type == "density"){
  print(graph + geom_density(position = "identity") + 
          geom_vline(data = mortality_rate_sum, aes(xintercept = get(paste(column_sum_headers[1]))), linetype = "dashed",col = "Black") ) }
else{ print(graph + geom_boxplot(position = "identity"))
  }
return(as_data_frame(mortality_rate_sum) %>% kbl() %>%
  kable_minimal())
}

This shows the Case Mortality Rate in terms of five continents: Africa, Americas, Asia, Europe, and Oceania. It appears that all five continents have relatively similar Case Mortality Rates, with Africa exhibiting the largest and Oceania showing the smallest. The graphs show that skewness exists in the data. For example, Asia appears to be right skewed, meaning that their average case mortality rate is greater than their median case mortality rate. Grouping by continents, however, is not necessarily a good way to evaluate Case Mortality Rate. Aggregating all the countries together into one continent is not accurate because each country has their own COVID restrictions. For example, The United States of America and Argentina have very different COVID policies, but are grouped together as one continent.

#variable_of_interest = "case_mortality_rate", abbreviation = "mr" ,x_axis_label = "Case Mortality Rate"

continent_sum("case_mortality_rate", "mr" ,"Case Mortality Rate","density")
Continent Avg_mr SD_mr Min_mr Q1_mr Med_mr Q3_mr Max_mr
Africa 0.0210039 0.0141002 0.0013141 0.0118675 0.0156296 0.0321103 0.0621740
Americas 0.0186915 0.0162453 0.0000000 0.0066466 0.0171675 0.0246783 0.0893326
Asia 0.0187897 0.0428910 0.0000000 0.0037222 0.0110780 0.0178521 0.2715427
Europe 0.0199444 0.0100682 0.0015198 0.0130195 0.0193246 0.0239845 0.0490196
Oceania 0.0092970 0.0120267 0.0000000 0.0000000 0.0037922 0.0139364 0.0317460
#continent_sum("Cumulative_cases", "cc" ,"Cumulative Cases")
#continent_sum("Cumulative_deaths", "cd" ,"Cumulative Deaths")
#continent_sum("cases_per_capita_2020", "cpc" ,"Cases Per Capita")
#continent_sum("deaths_per_capita_2020", "dpc" ,"Deaths Per Capita")

Descriptive Analysis: Mobility Data Spread for Countries

This graph plots the relationship between Retail Change Mobility and Case Mortality Rate. Retail Change Mobility represents changes in people going out to retail locations which include malls and department stores. Some countries, such as Afghanistan and Papa New Guinea, exibit positive changes in mobility despite COVID-19. This could be an indication that their governments have imposed loose COVID restrictions. Other countries, such as Peru and Argentina, show large decreases in Mobility. This could be an indication that their governments have imposed strict COVID restrictions.

ggplot(data=country_data, mapping=aes(x=Retail_Rec_change, y=case_mortality_rate)) +
  geom_point(outlier.shape=NA, color='black', size =0.75)+
  ylim(0,0.1)+
  xlab("Mobility (Retail Change)")+
  ylab("Case Mortality Rate")+
  ggtitle("Case Mortality vs Mobility (Retail Change)")+
  geom_label_repel(aes(label=ifelse( Retail_Rec_change >5 | Retail_Rec_change < -50 , as.character(Country),'')),
                   box.padding   = 1.5, 
                  point.padding = 0.5,
                  segment.color = 'red') +
  theme_classic()

This graph plots the relationship between Grocery Store Change Mobility and Case Mortality Rate. Grocery Store Change Mobility represents changes in people going out to supermarkets for food shopping. Some countries, such as Afghanistan and Mongolia, exibit positive changes in mobility despite COVID-19. This could be an indication that their governments have imposed loose COVID restrictions. Other countries, such as Panama, show large decreases in Mobility. This could be an indication that their governments have imposed strict COVID restrictions.

ggplot(data=country_data, mapping=aes(x=Grocery_Pharm_change, y= case_mortality_rate)) +
  geom_point(outlier.shape=NA, color='black', size =0.75)+
  ylim(0,0.1)+
  xlab("Mobility (Grocery Store Change)")+
  ylab("Case Mortality Rate")+
  ggtitle("Case Mortality vs Mobility (Grocery Store Change)")+
  geom_label_repel(aes(label=ifelse( Grocery_Pharm_change >20 | Grocery_Pharm_change < -25 , as.character(Country),'')),
                   box.padding   = 3, 
                  point.padding = 0.5,
                  segment.color = 'red') +
  theme_classic()

This graph plots the relationship between Park Change Mobility and Case Mortality Rate. Park Change Mobility represents changes in people going out to outdoor locations such as National Parks. Some countries, such as Denmark, exibit positive changes in mobility despite COVID-19. This could be an indication that their governments have imposed loose COVID restrictions. Other countries, such as Argentina, show large decreases in Mobility. This could be an indication that their governments have imposed strict COVID restrictions.


ggplot(data=country_data, mapping=aes(x=Parks_change, y=case_mortality_rate)) +
  geom_point(outlier.shape=NA, color='black', size =0.75)+
  ylim(0,0.1)+
  xlab("Mobility (Parks Change)")+
  ylab("Case Mortality Rate")+
  ggtitle("Case Mortality vs Mobility (Parks Change)")+
  geom_label_repel(aes(label=ifelse( Parks_change >50 | Parks_change < -50 , as.character(Country),'')),
                   box.padding   = 1.5, 
                  point.padding = 0.5,
                  segment.color = 'red') +
  theme_classic()

Mapping Global Case Data

Map of Cumulative Deaths Over Time

{

l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~Cumulative_deaths, 
              z = ~Cumulative_deaths,
              colors = "Blues",
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Cumulative Deaths", limits = c(0, 400000)) %>% 
    layout(title = "Cumulative Deaths by Country per Month <br> (Hover for details)", geo = g)
tm}

Map of New Cases Over Time

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~New_cases, 
              z = ~New_cases,
              colors = "Blues",
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "New Cases", limits = c(0, 14000000)) %>% 
    layout(title = "New Cases by Country per Month <br>(Hover for details)", geo = g)
tm}

Map of Retail Mobility Change

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~ret, 
              z = ~ret,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Retail and Recreation Mobility Change <br>(Hover for details)", geo = g)
tm
}

Map of Groceries Mobility Change

{

l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~gro, 
              z = ~gro,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Groceries and Pharmacy Mobility Change <br>(Hover for details)", geo = g)
tm}

Map of Parks Mobility Change

{

l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~par, 
              z = ~par,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Parks Mobility Change <br>(Hover for details)", geo = g)
tm
}

Map of Transit Stations Mobility Change

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~tra, 
              z = ~tra,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Transit Stations Mobility Change <br>(Hover for details)", geo = g)
tm
}

Map of Workplace Mobility Change

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~wor, 
              z = ~wor,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Workplace Mobility Change <br>(Hover for details)", geo = g)
tm
}

Map of Residential Mobility Change

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~res, 
              z = ~res,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Residential Mobility Change <br>(Hover for details)", geo = g)
tm
}

Inferential analysis

Independent/Explanatory Variables:
X_1: Retail recreation Change
X_2: Grocery and Pharmacy Change
X_3: Parks Change
X_4: Transit Stations Change
X_5: Work Change
X_6: Residential Change

Regression Assumptions:
(1) Regression function is linear
(2) The \(\epsilon's\) do have constant variance.
(3) The \(\epsilon's\) are independent and identically distributed.
(4) The model fits all observations.
(5) The \(\epsilon's\) are normally distributed.

We would like to test whether type of mobility has an effect on the number of cases per capita, number of deaths per capita, and the case mortality rate for all the countries with data available. We will do that using a multiple regression test:

Model 1: Predicting Cases Per Capita

Model: \(Y1_i= \beta_0 + \beta_1X_1+ \beta_2X_2 + \beta_3X_3 + \beta_4X_4 + \beta_5X_5 + \beta_6X_6 + \epsilon_i\) where the index i represents each country, and \(X_j\) , \(j = 1, ..., 6\) are the predictor variables of the mobility data (Grocery and Pharmacy, Retail and Recreation, Parks, Residential, Transit stations, and Workplaces), and \(\beta_0 , \beta_1, \beta_2 , \beta_3 , \beta_4 , \beta_5 , \beta_6\) are the regression coefficients for the mobility data for each country. \(\epsilon_i\) are the errors for each country that the model can’t explain. \(Y_i\) are the number of cases per capita for each country i (number of cases per capita is the first of the three response variables, thus denoted by \(Y1_i\)/). The constraints on the parameters are sum-to-zero. The residuals for the model should also sum to zero. We will also test for interaction effects between the predictor variables, as we suspect that the transmissibility of COVID-19 is not limited to just one high risk predictor variable but likely may involve more than one.

The predictor variables of Retail and Recreation, Grocery and Pharmacy, and Parks are the most significant out of the six total variables. Because our p-values of 1.236 x 10^-8, 0.008123, 9.601 x 10^-9 < all common values of \(\alpha\), we reject the null hypothesis. We conclude that at least one of the regression coefficients are not equal, so the mobility data of Retail and Recreation, Grocery and Pharmacy, and Parks make a difference in the number of cases per capita for the countries overall. Additionally, the R^2 is not as large as we would like, 0.4479, which means that the six predictor variables only explain 44.79% of the variance in the response variable number of COVID 19 cases per capita. Thus we should refit our model first with just the three factors of Retail and Recreation, Grocery and Pharmacy, and Parks and then see if there are any additional interaction effects.

#cases per capita vs our six factors of Retail and Recreation, Grocery & Pharmacy, Parks, Transit Stations, Workplaces, and Residential Change
model1 = lm(data = country_data, cases_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change +
            Transit_Stations_change +
            Work_change +
            Resid_change )
anova(model1)
summary(model1)
## 
## Call:
## lm(formula = cases_per_capita_2020 ~ Retail_Rec_change + Grocery_Pharm_change + 
##     Parks_change + Transit_Stations_change + Work_change + Resid_change, 
##     data = country_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -39.329 -13.279  -1.691  10.661  51.313 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -4.15341    6.66365  -0.623    0.534    
## Retail_Rec_change       -1.43190    0.30248  -4.734 6.72e-06 ***
## Grocery_Pharm_change     0.31903    0.26888   1.187    0.238    
## Parks_change             0.32299    0.07931   4.072 8.90e-05 ***
## Transit_Stations_change  0.09877    0.19677   0.502    0.617    
## Work_change             -0.43777    0.42561  -1.029    0.306    
## Resid_change            -0.95386    0.59521  -1.603    0.112    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.9 on 108 degrees of freedom
## Multiple R-squared:  0.4478, Adjusted R-squared:  0.4171 
## F-statistic:  14.6 on 6 and 108 DF,  p-value: 3.821e-12

Model 2: Predicting Deaths Per Capita

Model:\(Y2_i= \beta_0 + \beta_1X_1+\beta_2X_2 + \beta_3X_3 + β_4X_4 + \beta_5X_5 + \beta_6X_6 + ϵi\) where the index i represents each country, and \(X_j , j = 1, ..., 6\) are the predictor variables of the mobility data (Grocery and Pharmacy, Retail and Recreation, Parks, Residential, Transit stations, and Workplaces), and \(β_0, β_1, β_2, β_3, β_4, β_5, β_6\) are the regression coefficients for the mobility data for each country. ϵi are the errors for each country that the model can’t explain. \(Y2_i\) are the number of deaths per capita for each country i (number of deaths per capita is the second of the three response variables, thus denoted by Y2_i). The constraints on the parameters are sum-to-zero. The residuals for the model should also sum to zero. We will also test for interaction effects between the predictor variables.

The predictor variables of Retail and Recreation, Grocery and Pharmacy, and Parks are the most significant out of the six total variables. Because our p-values of 1.063 x 10^-8, 0.02678, 4.826 x 10^-8 < \(\alpha = 0.05\), we reject the null hypothesis. We conclude that at least one of the regression coefficients are not equal, so the mobility data of Retail and Recreation, Grocery and Pharmacy, and Parks make a difference in the number of COVID 19 related deaths per capita for the countries overall. Additionally, the R^2 is not as large as we would like, 0.4479, which means that the six predictor variables only explain 44.79% of the variance in the response variable number of COVID 19 related deaths per capita. Thus we should refit our model first with just the three factors of Retail and Recreation, Grocery and Pharmacy, and Parks and then see if there are any additional interaction effects. Additionally, the R^2 value of 0.4284 means that the six predictor variables explain only 42.84% of the variance in the response variable, the number of COVID-19 related deaths per capita. The amount of variance explained should ideally be above 80 percent to make better inferences so it would be in our best interest to refit the model with the significant variables and determine whether interaction effects are necessary.

#deaths per capita vs our six factors of Retail and Recreation, Grocery & Pharmacy, Parks, Transit Stations, Workplaces, and Residential Change
model2 = lm(data = country_data, deaths_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change +
            Transit_Stations_change +
            Work_change +
            Resid_change )
anova(model2)
summary(model2)
## 
## Call:
## lm(formula = deaths_per_capita_2020 ~ Retail_Rec_change + Grocery_Pharm_change + 
##     Parks_change + Transit_Stations_change + Work_change + Resid_change, 
##     data = country_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8360 -0.2422 -0.1118  0.2161  1.0207 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -0.0872730  0.1427442  -0.611 0.542224    
## Retail_Rec_change       -0.0303720  0.0064794  -4.687 8.12e-06 ***
## Grocery_Pharm_change     0.0038865  0.0057597   0.675 0.501256    
## Parks_change             0.0068044  0.0016990   4.005 0.000114 ***
## Transit_Stations_change  0.0007632  0.0042152   0.181 0.856662    
## Work_change             -0.0045865  0.0091172  -0.503 0.615945    
## Resid_change            -0.0213653  0.0127501  -1.676 0.096691 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4048 on 108 degrees of freedom
## Multiple R-squared:  0.427,  Adjusted R-squared:  0.3952 
## F-statistic: 13.41 on 6 and 108 DF,  p-value: 2.565e-11

Model 3: Predicting Case Mortality Rate

Model:\(Y3_i= \beta_0 + \beta_1X_1+\beta_2X_2 + \beta_3X_3 + \beta_4X_4 + \beta_5X_5 + \beta_6X_6 + \epsilon_i\)
Where the index i represents each country, and \(X_j , j = 1, ..., 6\) are the predictor variables of the mobility data (Grocery and Pharmacy, Retail and Recreation, Parks, Residential, Transit stations, and Workplaces), and \(\beta_0, \beta_1, \beta_2, \beta_3,\beta_4, \beta_5,\beta_6\) are the regression coefficients for the mobility data for each country. \(\epsilon_i\) are the errors for each country that the model can’t explain. \(Y3_i\) are the case mortality rates for each country i (case mortality rates is the third of the three response variables, thus denoted by \(Y3_i\)). The constraints on the parameters are sum-to-zero. The residuals for the model should also sum to zero. We will also test for interaction effects between the predictor variables.

From the p-values in the table, we can see that none of our mobility data predictor variables are significant. Because our p-values > \(\alpha = 0.05\), we fail to reject the null hypothesis. We conclude that there is no difference between the regression coefficients, so the mobility data of six predictor variables do not make a difference in the case mortality rate for the countries overall. Additionally, the R^2 value is very small, 0.04591, which means that the current model has the six predictor variables explain ~ 4.6% of the variability in the response variable, case mortality rate. Given this information, it seems likely that the case mortality rate is not significantly affected by the mobility data. However, it may just be the case that the relationship between the predictor variables and the case mortality rate is nonlinear.

#case mortality rate vs our six factors of Retail and Recreation, Grocery & Pharmacy, Parks, Transit Stations, Workplaces, and Residential Change
model3 = lm(data = country_data, case_mortality_rate ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change +
            Transit_Stations_change +
            Work_change +
            Resid_change )
anova(model3)
summary(model3)
## 
## Call:
## lm(formula = case_mortality_rate ~ Retail_Rec_change + Grocery_Pharm_change + 
##     Parks_change + Transit_Stations_change + Work_change + Resid_change, 
##     data = country_data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.037936 -0.011807 -0.002888  0.005852  0.245487 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)  
## (Intercept)              2.469e-02  9.852e-03   2.506   0.0137 *
## Retail_Rec_change        3.982e-04  4.472e-04   0.890   0.3752  
## Grocery_Pharm_change     4.130e-04  3.975e-04   1.039   0.3012  
## Parks_change            -8.740e-05  1.173e-04  -0.745   0.4577  
## Transit_Stations_change  9.546e-05  2.909e-04   0.328   0.7435  
## Work_change             -3.805e-04  6.293e-04  -0.605   0.5466  
## Resid_change             3.586e-04  8.800e-04   0.407   0.6845  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02794 on 108 degrees of freedom
## Multiple R-squared:  0.04631,    Adjusted R-squared:  -0.006677 
## F-statistic: 0.874 on 6 and 108 DF,  p-value: 0.5166

Interactions Effects

Our secondary question of interest is which predictor variables are associated with the largest case mortality rate, the most number of cases per capita, and the most number of deaths per capita. From our p-values for models 1 and 2, we found that Retail and Recreation, Grocery and Pharmacy, and Parks are associated with the most number of cases per capita and most number of deaths per capita. However, since none of the six predictor variables had significant p-values or R^2 value, we cannot say that there is a predictor variable that is associated with the largest case mortality rate. Therefore we do not test for interaction effects for the third model.

To determine the best fit, we will take the three predictor variables that have significant p-values for models 1 and 2: Retail and Recreation, Grocery and Pharmacy, and Parks. For model 3 we will look at different interaction effects to determine if any of the prediction variables in combination explain more of the variation in the response variable case mortality rate.

Model 1 with Interaction Effects

Model:\(Y1_i= \beta_0 + \beta_1*X_1+ \beta_2*X_2 + \beta_3 *X_3 + \beta_4 *X_1*X_2 + \beta_5*X_1*X_3 + \beta_6*X_2*X_3 + \epsilon_i\), where the index i represents each country, and $ X_j, j = 1,2,3$ are the significant predictor variables of the mobility data (Grocery and Pharmacy, Retail and Recreation, and Parks), and \(\beta_0, \beta_1, \beta_2, \beta_3,\beta_4, \beta_5,\beta_6\) are the regression coefficients for the mobility data for each country. \(\beta_4 *X_1*X_2 + \beta_5*X_1*X_3 + \beta_6*X_2*X_3\) are the interaction effects between the three chosen predictor variables. $ _i$ are the errors for each country that the model can’t explain. \(Y1_i\) are the number of cases per capita for each country i (number of cases per capita is the first of the three response variables, thus denoted by \(Y1_i\)). The constraints on the parameters are sum-to-zero. The residuals for the model should also sum to zero.

From our new model for the number of cases per capita we can see that the \(R^2\) correlation coefficient is not that much larger than that of our original model, even with the interaction effects. This may mean that for the number of cases per capita the interaction between the three significant variables does not explain much more of its variation.

#REFITTED MODEL 1 WITH INTERACTION EFFECTS
#cases per capita vs our three factors of Retail and Recreation, Grocery & Pharmacy, and Parks
refit_model1 = lm(data = country_data, cases_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change+ Grocery_Pharm_change:Parks_change + Grocery_Pharm_change:Retail_Rec_change + Parks_change:Retail_Rec_change+Grocery_Pharm_change:Parks_change:Retail_Rec_change )
anova(refit_model1)
summary(refit_model1)
## 
## Call:
## lm(formula = cases_per_capita_2020 ~ Retail_Rec_change + Grocery_Pharm_change + 
##     Parks_change + Grocery_Pharm_change:Parks_change + Grocery_Pharm_change:Retail_Rec_change + 
##     Parks_change:Retail_Rec_change + Grocery_Pharm_change:Parks_change:Retail_Rec_change, 
##     data = country_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -38.593 -11.887  -1.894  12.756  53.237 
## 
## Coefficients:
##                                                       Estimate Std. Error
## (Intercept)                                         -7.0389705  4.8331882
## Retail_Rec_change                                   -1.4290946  0.2233363
## Grocery_Pharm_change                                 0.7395853  0.3038804
## Parks_change                                         0.2521534  0.1562750
## Grocery_Pharm_change:Parks_change                   -0.0218734  0.0130713
## Retail_Rec_change:Grocery_Pharm_change               0.0248089  0.0117640
## Retail_Rec_change:Parks_change                      -0.0039967  0.0070012
## Retail_Rec_change:Grocery_Pharm_change:Parks_change -0.0001723  0.0003421
##                                                     t value Pr(>|t|)    
## (Intercept)                                          -1.456   0.1482    
## Retail_Rec_change                                    -6.399 4.24e-09 ***
## Grocery_Pharm_change                                  2.434   0.0166 *  
## Parks_change                                          1.614   0.1096    
## Grocery_Pharm_change:Parks_change                    -1.673   0.0972 .  
## Retail_Rec_change:Grocery_Pharm_change                2.109   0.0373 *  
## Retail_Rec_change:Parks_change                       -0.571   0.5693    
## Retail_Rec_change:Grocery_Pharm_change:Parks_change  -0.504   0.6155    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.69 on 107 degrees of freedom
## Multiple R-squared:  0.4647, Adjusted R-squared:  0.4297 
## F-statistic: 13.27 on 7 and 107 DF,  p-value: 3.198e-12

Model 2 with Interaction Effects

\(Y2_i= \beta_0 + \beta_1*X_1+ \beta_2*X_2 + \beta_3 *X_3 + \beta_4 *X_1*X_2 + \beta_5*X_1*X_3 + \beta_6*X_2*X_3 + \epsilon_i\), where the index i represents each country, and X_j, j = 1,2,3 are the significant predictor variables of the mobility data (Grocery and Pharmacy, Retail and Recreation, and Parks), and \(\beta_0, \beta_1, \beta_2, \beta_3,\beta_4, \beta_5,\beta_6\) are the regression coefficients for the mobility data for each country. $ _i$ are the errors for each country that the model can’t explain. $Y2_i $are the number of deaths per capita for each country i (number of deaths per capita is the second of the three response variables, thus denoted by \(Y2_i\)). The constraints on the parameters are sum-to-zero. The residuals for the model should also sum to zero.

From our new model for the number of deaths per capita we can see that the R^2 correlation coefficient is not that much larger than that of our original model, even with the interaction effects. Again, this most likely indicates that for the number of deaths per capita the interaction between the three significant variables does not explain much more of its variation.

#REFITTED MODEL 2 WITH INTERACTION EFFECTS
#deaths per capita vs our three factors of Retail and Recreation, Grocery & Pharmacy, and Parks
refit_model2 = lm(data = country_data, deaths_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change+ Grocery_Pharm_change:Parks_change + Grocery_Pharm_change:Retail_Rec_change + Parks_change:Retail_Rec_change)
anova(refit_model2)
summary(refit_model2)
## 
## Call:
## lm(formula = deaths_per_capita_2020 ~ Retail_Rec_change + Grocery_Pharm_change + 
##     Parks_change + Grocery_Pharm_change:Parks_change + Grocery_Pharm_change:Retail_Rec_change + 
##     Parks_change:Retail_Rec_change, data = country_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.80954 -0.23484 -0.06574  0.17959  1.16452 
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                            -2.077e-01  9.461e-02  -2.196  0.03026
## Retail_Rec_change                      -2.862e-02  4.571e-03  -6.261 7.92e-09
## Grocery_Pharm_change                    1.674e-02  6.357e-03   2.632  0.00972
## Parks_change                            6.095e-03  3.098e-03   1.967  0.05171
## Grocery_Pharm_change:Parks_change      -4.383e-04  1.904e-04  -2.301  0.02328
## Retail_Rec_change:Grocery_Pharm_change  7.359e-04  2.369e-04   3.106  0.00242
## Retail_Rec_change:Parks_change         -3.507e-05  1.391e-04  -0.252  0.80136
##                                           
## (Intercept)                            *  
## Retail_Rec_change                      ***
## Grocery_Pharm_change                   ** 
## Parks_change                           .  
## Grocery_Pharm_change:Parks_change      *  
## Retail_Rec_change:Grocery_Pharm_change ** 
## Retail_Rec_change:Parks_change            
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3912 on 108 degrees of freedom
## Multiple R-squared:  0.4649, Adjusted R-squared:  0.4352 
## F-statistic: 15.64 on 6 and 108 DF,  p-value: 7.505e-13

Sensitivity analysis

Diagnostics Model 1

Residuals vs Fitted

Our first plot for model 1 is the residual vs fitted plot, which should have the data points having no pattern and are equally on both sides of the zero line. We can see that the residual values are not distributed randomly and are scattered in a clear nonlinear pattern around the zero line, so it is reasonable to say that the errors do not have equal variance and the relationship between the factors of our model and our response of the number of COVID cases per capita is not linear. We also have three outliers of 9, 5, and 88.

plot(model1,1)


Normal Q-Q Plot of Residuals

Our second plot for model 1 is the Q-Q plot, which should have the data points look as close to linear as possible, in which case the error would be normal. In our plot we can see that the data points are very close to the trend line, so it looks approximately linear. However, there are also heavy tails which may indicate some variation in the data that prevents it from being more close to Normal. Additionally there are a few outliers in one tail end. It makes sense for the errors to be approximately normal in this case because the number of cases per capita is a very large population of people, and a large sample generally approximates to normal. Overall, it is reasonable to say that the errors are normally distributed, and so the assumption is met.

plot(model1,2)


Scale Location Plot

Our third plot for model 1 is the scale-location plot, which should have the data points exhibit no visible pattern, and this will tell us whether the groups have constant variance. In our plot the data points are spread very unevenly above and below the red line at standardized residual value of between 0.5 and 0.1, and there is an obvious nonlinear pattern. Thus we can deduce that the variance is nonconstant, and the assumption is not met.

plot(model1,3)

Residuals vs Leverage

The fourth plot for model 1 is the residual-leverage plot, which should have no visible influential points or outliers in the upper right or lower right corner. In our plot there do not seem to be any influential points or outliers (other than 98 and 82, but they are inside the cook’s line and thus not as influential), as the data all lie within the Cook’s distance red dashed line. Therefore it is reasonable to say that there are no influential cases or outliers.

plot(model1,5)

Diagnostics Model 2

Residuals vs Fitted

Our first plot for model 2 is the residual vs fitted plot, which should have the data points having no pattern and are equally on both sides of the zero line. We can see that the residual values are not distributed randomly and are looking to be clearly nonlinear around the zero line, so it is reasonable to say that the errors do not have equal variance and the relationship between the factors of our model and our response of the number of COVID deaths per capita is definitely not linear. We also have three outliers of 16, 98, and 13.

plot(model2,1) 

Normal Q-Q Plot of Residuals

Our second plot for model 2 is the Q-Q plot, which should have the data points look as close to linear as possible, in which case the error would be normal. In our plot we can see that the data points are not as close to the trend line, so it looks approximately linear, but it could be much closer to the trend line than it is. There are also very heavy tails which may indicate some variation in the data that prevents it from being more close to Normal. Additionally there are a few outliers in one tail end. From the plot we would have to make the model undergo more transformations to fit it normally, so we cannot say that the errors are normally distributed. It seems reasonable that the errors for number of deaths per capita may not be as close to normally distributed than for the models of the other response variables because the number of deaths could be underreported, there are no false positives for deaths, and the number of deaths is smaller than the number of cases of COVID-19, since not all people who contract the disease die.

plot(model2,2) 

Scale Location Plot

Our third plot for model 2 is the scale-location plot, which should have the data points exhibit no visible pattern, and this will tell us whether the groups have constant variance. In our plot the data points are spread very unevenly above and below the red line at standardized residual value of between 0.5 and 0.1, and there is an obvious nonlinear pattern. Thus we can deduce that the variance is nonconstant, and the assumption is not met.

plot(model2,3) 

Residuals vs Leverage

The fourth plot for model 2 is the residual-leverage plot, which should have no visible influential points or outliers in the upper right or lower right corner. In our plot there do not seem to be any influential points or outliers (other than 16,35, and 98, but they are inside the Cook’s line and thus not as influential), as the data all lie within the Cook’s distance red dashed line. Therefore it is reasonable to say that there are no influential cases or outliers.

plot(model2,5) 

Diagnostics Model 3

Residuals vs Fitted

Our first plot for model 3 is the residual vs fitted plot, which should have the data points having no pattern and are equally on both sides of the zero line. We can see that the residual values are not distributed randomly and are scattered in a clear pattern around a line close to the zero line, so it is reasonable to say that the errors do not have equal variance and the relationship between the factors of our model and our response of the case mortality rate is definitely not linear. We also have three outliers of 68,69, and 113.

plot(model3,1) 

Normal Q-Q Plot of Residuals

Our second plot for model 3 is the Q-Q plot, which should have the data points look as close to linear as possible, in which case the error would be normal. In our plot we can see that the data points are close to the trend line, so it looks approximately linear,and the tails are relatively light with three outliers of 68,69, and 113. The errors are normally distributed, so the assumption is met.

plot(model3,2) 

Scale Location Plot

Our third plot for model 3 is the scale-location plot, which should have the data points exhibit no visible pattern, and this will tell us whether the groups have constant variance. In our plot the data points are spread somewhat evenly above and below the red line at standardized residual value of between 0.5 and 0.1, and there is a large cluster between 0.015 vs 0.025. Thus we can deduce that the variance is nonconstant, and the assumption is not met.

plot(model3,3) 

Residuals vs Leverage

The fourth plot for model 3 is the residual-leverage plot, which should have no visible influential points or outliers in the upper right or lower right corner. In our plot there is definitely at least one influential point or outlier, which is 113. The data do not all lie within the Cook’s distance red dashed line so the assumption is not met.

plot(model3,5) 

For the three original models for: (1) number of cases per capita vs the six predictor variables of Grocery and Pharmacy, Retail and Recreation, Parks, Residential, Transit stations, and Workplaces
(2) number of deaths per capita vs the six predictor variables of Grocery and Pharmacy, Retail and Recreation, Parks, Residential, Transit stations, and Workplaces
(3) case mortality vs the six predictor variables of Grocery and Pharmacy, Retail and Recreation, Parks, Residential, Transit stations, and Workplaces

we found that the assumptions for none of them were met. We may need to conduct more testing and model fitting to determine the best fitting model, but it is likely that the model is nonlinear given our diagnostic plot analysis, and the fact that when we refit the first two models with interaction effects, the difference in the \(R^2\) value was not large (the third model was not refit as we found that none of the predictor variables could explain the variance in the case mortality rate). Another issue to consider is that because not a lot of the variation in the response variables of number of cases per capita, number of deaths per capita, and case mortality rate is explained by the aforementioned six predictor variables, there are many confounding variables that could be contributing to this variation which we cannot explain.


Conclusion

This report synthesized information from the WHO and Google Mobility Data to create a comprehensive analysis of mobility through six high risk categories and COVID-19 transmission (via number of deaths per capita, case mortality rate, and number of cases per capita). We wanted to understand whether these categories could have a significant effect on these relative measures of COVID 19 transmission, and if so, which categories were most significant.

The dataset from the WHO provided information on cumulative deaths and cases for 237 countries while the Google dataset provided information on mobility trends for 135 countries. Specifically, the google data contains the changes in mobility for Grocery and Pharmacy, Retail and Recreation, Parks, Residential, Transit stations, and Workplaces for each respective country. We conducted a descriptive analysis which included summary statistics, graphs, and maps of mobility data case mortality rate, and cumulative deaths.
Our inferential analysis included three models. Model 1 sought to find a relationship between the 6 distinct mobility variables and cases per capita using multiple regression. We concluded that at least one regression coefficient was not equal, namely, the mobility data of Retail and Recreation, Grocery and Pharmacy, and Parks make a difference in the number of cases per capita while Parks, Transit Stations, and Workplaces make no significant difference for the countries overall. Model 2 sought to find a relationship between the six distinct mobility variables and deaths per capita using multiple regression. We concluded that at least one regression coefficient was not equal, namely, the mobility data of Retail and Recreation, Grocery and Pharmacy, and Parks make a difference in the number of cases per capita while Parks, Transit Stations, and Workplaces make no significant difference in the countries overall. Model 3 sought to find a relationship between the 6 distinct mobility variables and case mortality rate using multiple regression. We concluded that there was no difference between the regression coefficients, so the mobility data of six predictor variables did not make a difference in the case mortality rate for the countries overall. Our secondary question of interest sought to answer which predictor variables were associated with the largest case mortality rate, the most number of cases per capita, and the most number of deaths per capita. We determined that Retail and Recreation, Grocery and Pharmacy, and Parks are the categories that are associated with the most number of cases per capita and number of deaths per capita, but there was no one category which was associated with the largest mortality rate.

Our Sensitivity analysis included a residuals vs fitted plot, a normal Q-Q plot of residuals, a scale location plot, and a residuals vs leverage plot for each model. Model 1 had diagnostics suggesting nonlinearity between factors and our response, normal distribution of errors, nonconstant variance, and non influential cases or outliers. Model 2 had diagnostics suggesting nonlinearity between factors and our response, non normal distribution of errors, nonconstant variance, and no influential cases or outliers. Model 3 had diagnostics suggesting nonlinearity between factors and our response, normal distribution of errors, nonconstant variance, and at least one influential case or outlier.

Because COVID-19 is still a relatively unstudied area, there are many confounding factors which could influence the analysis. We noticed that the R^2 correlation coefficients did not explain most of the variation of the response for our models, even when we adjusted them for interaction effects. The data may follow a nonlinear relationship between the predictor variables and the response variables, and even if we fit them they may still not be able to explain a lot of the variation. It is difficult to explain even for advanced epidemiologists how people can prevent themselves from contracting COVID-19 when they need to go to certain places which could be characterized as high risk. As a result, there are many confounding factors that we may not be able to test for but which have a significant impact nonetheless. With more progressive models and included nonlinearity in regression we can try to quantify more of the variation in the responses, but we would need more advanced techniques.

The results of our report could have major implications. For example, by knowing which changes in mobility result in the highest increase in deaths or cases, one could avoid high risk areas and potentially have a lower risk of exposure. This is particularly important knowledge for high-risk individuals who are deciding whether they should risk going to the supermarket or the local park for recreational activities. Knowing the risk factors beforehand could mean the difference between life or death.


Acknowledgement

Zhang, Zitong


Reference

Argentina’s violent enforcement of Covid-19 Rules. (2020, November 20). Retrieved March 05, 2021, from https://www.hrw.org/news/2020/11/20/argentinas-violent-enforcement-covid-19-rules

Badr, H. S., Du, H., Marshall, M., Dong, E., Squire, M. M., & Gardner, L. M. (2020). Association between mobility patterns and COVID-19 transmission in the USA: A mathematical modelling study. The Lancet Infectious Diseases, 20(11), 1247-1254. doi:10.1016/s1473-3099(20)30553-3

Chang, S., Pierson, E., Koh, P. W., Gerardin, J., Redbird, B., Grusky, D., & Leskovec, J. (2020). Mobility network models of COVID-19 explain inequities and inform reopening. Nature, 589(7840), 82-87. doi:10.1038/s41586-020-2923-3

Larsen, J. R., Martin, M. R., Martin, J. D., Kuhn, P., & Hicks, J. B. (2020). Modeling the Onset of Symptoms of COVID-19. Frontiers in Public Health, 8. doi:10.3389/fpubh.2020.00473

Oran, D. P., & Topol, E. J. (2021). Prevalence of Asymptomatic SARS-CoV-2 Infection. Annals of Internal Medicine, 174(2), 286-287. doi:10.7326/l20-1285

Nouvellet, P., Bhatia, S., Cori, A., Ainslie, K., Baguelin, M., Bhatt, S., . . . Donnelly, C. (2021, February 17). Reduction in mobility and COVID-19 Transmission. Retrieved March 05, 2021, from https://www.nature.com/articles/s41467-021-21358-2#citeas

Praharaj, S., & Han, H. (2020). A longitudinal study of the impact of human mobility on the incidence of COVID-19 in India. doi:10.1101/2020.12.21.20248523

Rezende, L. F., Thome, B., Schveitzer, M. C., Souza-Júnior, P. R., & Szwarcwald, C. L. (2020). Adults at high-risk of severe coronavirus disease-2019 (Covid-19) in Brazil. Revista De Saúde Pública, 54, 50. doi:10.11606/s1518-8787.2020054002596

Zucchino, D., & Abed, F. (2020, December 20). ‘Covid can’t COMPETE.’ in a Place mired in war, the virus is an afterthought. Retrieved March 05, 2021, from G. (n.d.). Community Mobility Reports. Retrieved March 5, 2021, from https://www.google.com/covid19/mobility/.

W. (n.d.). WHO COVID-19 Global Data. Retrieved March 5, 2021, from https://covid19.who.int/WHO-COVID-19-global-data.csv.

W. (n.d.). Population Total. Retrieved March 5, 2021, from https://data.worldbank.org/indicator/SP.POP.TOTL.
***

Appendix

knitr::opts_chunk$set(fig.pos = 'H',warning=FALSE,message=FALSE)
library(tidyverse)
library(readxl)
library(data.table)
library(cowplot)
library(kableExtra)
library(countrycode)
library(ggrepel)
library(nlme)
library(hablar)
library(plotly)
covidmobility <- read_csv("~/Downloads/Global_Mobility_Report.csv") %>%
    filter(is.na(sub_region_1)) %>%
    mutate(Date_reported = as.Date(date), Country_code = country_region_code) %>%
    select(-c("sub_region_1",
            "sub_region_2",
            "metro_area",
            "iso_3166_2_code",
            "census_fips_code"))

covidWHO <- read_csv("https://covid19.who.int/WHO-COVID-19-global-data.csv") %>%
    mutate(Date_reported = as.Date(Date_reported))

world_pop = read_csv("~/STA_141A_Project_2/World_Pop_Data.csv") %>%
    mutate(Country = name)
cases_by_country = covidWHO %>%  mutate(Year = substr(Date_reported,1,4),
                                 Month = substr(Date_reported,6,7))  %>% 
  na.omit() %>%
  group_by(Country_code,Country) %>% 
  summarise(
        Country_code = unique(Country_code),
           Country = unique(Country),
           WHO_region = unique(WHO_region),
           Cumulative_cases = max(Cumulative_cases),
           Cumulative_deaths = max(Cumulative_deaths),
           case_mortality_rate = Cumulative_deaths/Cumulative_cases) %>%
  mutate("Continent" = countrycode(Country,"country.name","continent") ) %>%
  inner_join(world_pop, by  = "Country" ) %>% 
  mutate(cases_per_capita_2020 = Cumulative_cases/pop2020,
         deaths_per_capita_2020 = Cumulative_deaths/pop2020)

cases_by_country
covid_data = inner_join(covidmobility,covidWHO,by = c("Country_code","Date_reported")) %>% 
  select(c(
           "Country_code",
           "Country",
           "Date_reported",
           "retail_and_recreation_percent_change_from_baseline",   
           "grocery_and_pharmacy_percent_change_from_baseline",
           "parks_percent_change_from_baseline",
           "transit_stations_percent_change_from_baseline",  
           "workplaces_percent_change_from_baseline",
           "residential_percent_change_from_baseline",
           "WHO_region",
           "New_cases",
           "Cumulative_cases",
           "New_deaths",
           "Cumulative_deaths")) %>%
  mutate(Year = substr(Date_reported,1,4),
         Month = substr(Date_reported,6,7)) 
country_data = covid_data %>% group_by(Country) %>% na.omit() %>% dplyr::summarise(Number_Cases = max(Cumulative_cases),
            Number_Deaths = max(Cumulative_deaths),
            case_mortality_rate = Number_Deaths/Number_Cases,
            Retail_Rec_change = mean(retail_and_recreation_percent_change_from_baseline),
            Grocery_Pharm_change = mean(grocery_and_pharmacy_percent_change_from_baseline),
            Parks_change = mean(parks_percent_change_from_baseline),
            Transit_Stations_change = mean(transit_stations_percent_change_from_baseline),
            Work_change = mean(workplaces_percent_change_from_baseline),
            Resid_change = mean(residential_percent_change_from_baseline))


country_data = inner_join(country_data, world_pop, by  = "Country" ) %>% 
  mutate(cases_per_capita_2020 = Number_Cases/pop2021,
         deaths_per_capita_2020 = Number_Deaths/pop2021)
country_data %>% select(-c(Number_Cases,Number_Deaths,case_mortality_rate))
covid_month_data_20 = covid_data %>% 
  na.omit() %>%
  #filter(Year == "2020") %>%
  group_by(Country_code,Country,Month,Year) %>% 
  summarise(
        Country_code = unique(Country_code),
           Country = unique(Country),
           retail_and_recreation_percent_change_from_baseline = mean(retail_and_recreation_percent_change_from_baseline),   
           grocery_and_pharmacy_percent_change_from_baseline =mean(grocery_and_pharmacy_percent_change_from_baseline) ,
           parks_percent_change_from_baseline = mean(parks_percent_change_from_baseline),
           transit_stations_percent_change_from_baseline = mean(transit_stations_percent_change_from_baseline),  
           workplaces_percent_change_from_baseline = mean(workplaces_percent_change_from_baseline),
           residential_percent_change_from_baseline = mean(residential_percent_change_from_baseline),
           WHO_region = unique(WHO_region),
           New_cases = sum(New_cases),
           Cumulative_cases = max(Cumulative_cases),
           New_deaths = sum(New_deaths),
           Cumulative_deaths = max(Cumulative_deaths) ) 
#covid_month_data_20
covid_month_data_20<- covid_month_data_20 %>% filter(Year == "2020")
#head(covid_month_data_20)

cd <- covid_month_data_20 %>% select(Country, Month, Country_code,
                                     Cumulative_deaths, Cumulative_cases, New_cases, New_deaths,
                                     ret = retail_and_recreation_percent_change_from_baseline, 
                                     gro = grocery_and_pharmacy_percent_change_from_baseline,
                                     par = parks_percent_change_from_baseline, 
                                     tra = transit_stations_percent_change_from_baseline, 
                                     wor = workplaces_percent_change_from_baseline, 
                                     res = residential_percent_change_from_baseline
                                     )
cd["Country_code"] <-   countrycode(cd$Country_code, "iso2c", "iso3c")
cdt <- cd %>% convert(num(Month))
continent_sum = function(variable_of_interest,abbreviation,x_axis_label,graph_type = "boxplot"){

sum_stats = c("Avg" ,"SD"  , "Min" , "Q1"  , "Med" , "Q3"  ,"Max" )
column_sum_headers = paste(sum_stats,"_",abbreviation,sep="")

mortality_rate_sum = cases_by_country %>%
    group_by(Continent) %>%     #grouping by class type
  na.omit() %>%
    summarise(      Avg   = mean(get(variable_of_interest)),         #mean
                              SD    = sd(get(variable_of_interest)),           #standard deviation
                            Min   = min(get(variable_of_interest)),          #min
                            Q1    = quantile(get(variable_of_interest),.25), #quartile 1
                            Med   = median(get(variable_of_interest)),       #median
                            Q3    = quantile(get(variable_of_interest),.75), #quartile 3
                            Max   = max(get(variable_of_interest)))          #Max

names(mortality_rate_sum) = c("Continent",column_sum_headers)
graph = cases_by_country %>% filter(!is.na(Continent)) %>%
  ggplot( aes(x=get(variable_of_interest),fill = Continent)) +
    #geom_boxplot(position = "identity") + #density plot
    labs(fill = "Continent") + #color based on class type
    xlab(x_axis_label) + #x axis label
    ylab("Density") + #y axis label
    facet_grid(Continent~.) + #putting each class type in a different graph
    ggtitle(paste(x_axis_label, "by Global Continent")) + #adding average of each plot
    theme(plot.title = element_text(face = "bold",
                                                                    hjust = .5)
           #axis.text.y = element_blank()
       )
if(graph_type == "density"){
  print(graph + geom_density(position = "identity") + 
          geom_vline(data = mortality_rate_sum, aes(xintercept = get(paste(column_sum_headers[1]))), linetype = "dashed",col = "Black") ) }
else{ print(graph + geom_boxplot(position = "identity"))
  }
return(as_data_frame(mortality_rate_sum) %>% kbl() %>%
  kable_minimal())
}
#variable_of_interest = "case_mortality_rate", abbreviation = "mr" ,x_axis_label = "Case Mortality Rate"

continent_sum("case_mortality_rate", "mr" ,"Case Mortality Rate","density")
#continent_sum("Cumulative_cases", "cc" ,"Cumulative Cases")
#continent_sum("Cumulative_deaths", "cd" ,"Cumulative Deaths")
#continent_sum("cases_per_capita_2020", "cpc" ,"Cases Per Capita")
#continent_sum("deaths_per_capita_2020", "dpc" ,"Deaths Per Capita")
ggplot(data=country_data, mapping=aes(x=Retail_Rec_change, y=case_mortality_rate)) +
  geom_point(outlier.shape=NA, color='black', size =0.75)+
  ylim(0,0.1)+
  xlab("Mobility (Retail Change)")+
  ylab("Case Mortality Rate")+
  ggtitle("Case Mortality vs Mobility (Retail Change)")+
  geom_label_repel(aes(label=ifelse( Retail_Rec_change >5 | Retail_Rec_change < -50 , as.character(Country),'')),
                   box.padding   = 1.5, 
                  point.padding = 0.5,
                  segment.color = 'red') +
  theme_classic()
ggplot(data=country_data, mapping=aes(x=Grocery_Pharm_change, y= case_mortality_rate)) +
  geom_point(outlier.shape=NA, color='black', size =0.75)+
  ylim(0,0.1)+
  xlab("Mobility (Grocery Store Change)")+
  ylab("Case Mortality Rate")+
  ggtitle("Case Mortality vs Mobility (Grocery Store Change)")+
  geom_label_repel(aes(label=ifelse( Grocery_Pharm_change >20 | Grocery_Pharm_change < -25 , as.character(Country),'')),
                   box.padding   = 3, 
                  point.padding = 0.5,
                  segment.color = 'red') +
  theme_classic()


ggplot(data=country_data, mapping=aes(x=Parks_change, y=case_mortality_rate)) +
  geom_point(outlier.shape=NA, color='black', size =0.75)+
  ylim(0,0.1)+
  xlab("Mobility (Parks Change)")+
  ylab("Case Mortality Rate")+
  ggtitle("Case Mortality vs Mobility (Parks Change)")+
  geom_label_repel(aes(label=ifelse( Parks_change >50 | Parks_change < -50 , as.character(Country),'')),
                   box.padding   = 1.5, 
                  point.padding = 0.5,
                  segment.color = 'red') +
  theme_classic()

{

l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~Cumulative_deaths, 
              z = ~Cumulative_deaths,
              colors = "Blues",
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Cumulative Deaths", limits = c(0, 400000)) %>% 
    layout(title = "Cumulative Deaths by Country per Month <br> (Hover for details)", geo = g)
tm}
{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~New_cases, 
              z = ~New_cases,
              colors = "Blues",
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "New Cases", limits = c(0, 14000000)) %>% 
    layout(title = "New Cases by Country per Month <br>(Hover for details)", geo = g)
tm}
{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~ret, 
              z = ~ret,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Retail and Recreation Mobility Change <br>(Hover for details)", geo = g)
tm
}
{

l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~gro, 
              z = ~gro,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Groceries and Pharmacy Mobility Change <br>(Hover for details)", geo = g)
tm}
{

l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~par, 
              z = ~par,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Parks Mobility Change <br>(Hover for details)", geo = g)
tm
}

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~tra, 
              z = ~tra,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Transit Stations Mobility Change <br>(Hover for details)", geo = g)
tm
}

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~wor, 
              z = ~wor,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Workplace Mobility Change <br>(Hover for details)", geo = g)
tm
}

{
l <- list(color = toRGB("black"), width = 0.7)
g <- list(showframe = T,showland=T, landcolor="white", showcoastlines = T,bgcolor = "lightsteelblue1", showocean=T, oceancolor="aliceblue",
          projection = list(type = "natural earth"))
tm <- cdt %>% plot_geo(locationmode = "ISO-3", frame = ~Month) 
tm <- tm %>% add_trace(locations=~Country_code, 
              color = ~res, 
              z = ~res,
              colorscale = list(c(0, "rgb(255, 0, 0)"), list(0.5, "rgb(255, 255, 255)"), list(1, "rgb(0, 0, 255)")),
              marker = list(line = l),
              text = ~Country) %>% 
    colorbar(title = "Mobility Change", limits = c(-100, 100)) %>% 
    layout(title = "Residential Mobility Change <br>(Hover for details)", geo = g)
tm
}
#cases per capita vs our six factors of Retail and Recreation, Grocery & Pharmacy, Parks, Transit Stations, Workplaces, and Residential Change
model1 = lm(data = country_data, cases_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change +
            Transit_Stations_change +
            Work_change +
            Resid_change )
anova(model1)
summary(model1)
#deaths per capita vs our six factors of Retail and Recreation, Grocery & Pharmacy, Parks, Transit Stations, Workplaces, and Residential Change
model2 = lm(data = country_data, deaths_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change +
            Transit_Stations_change +
            Work_change +
            Resid_change )
anova(model2)
summary(model2)


#case mortality rate vs our six factors of Retail and Recreation, Grocery & Pharmacy, Parks, Transit Stations, Workplaces, and Residential Change
model3 = lm(data = country_data, case_mortality_rate ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change +
            Transit_Stations_change +
            Work_change +
            Resid_change )
anova(model3)
summary(model3)
#REFITTED MODEL 1 WITH INTERACTION EFFECTS
#cases per capita vs our three factors of Retail and Recreation, Grocery & Pharmacy, and Parks
refit_model1 = lm(data = country_data, cases_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change+ Grocery_Pharm_change:Parks_change + Grocery_Pharm_change:Retail_Rec_change + Parks_change:Retail_Rec_change+Grocery_Pharm_change:Parks_change:Retail_Rec_change )
anova(refit_model1)
summary(refit_model1)
#REFITTED MODEL 2 WITH INTERACTION EFFECTS
#deaths per capita vs our three factors of Retail and Recreation, Grocery & Pharmacy, and Parks
refit_model2 = lm(data = country_data, deaths_per_capita_2020 ~ Retail_Rec_change +
            Grocery_Pharm_change+
            Parks_change+ Grocery_Pharm_change:Parks_change + Grocery_Pharm_change:Retail_Rec_change + Parks_change:Retail_Rec_change)
anova(refit_model2)
summary(refit_model2)
plot(model1,1)
plot(model1,2)
plot(model1,3)
plot(model1,5)
plot(model2,1) 
plot(model2,2) 
plot(model2,3) 
plot(model2,5) 
plot(model3,1) 
plot(model3,2) 
plot(model3,3) 
plot(model3,5) 
sessionInfo()

Session info

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] plotly_4.9.3      hablar_0.3.0      nlme_3.1-148      ggrepel_0.9.1    
##  [5] countrycode_1.2.0 kableExtra_1.3.1  cowplot_1.1.0     data.table_1.13.0
##  [9] readxl_1.3.1      forcats_0.5.0     stringr_1.4.0     dplyr_1.0.2      
## [13] purrr_0.3.4       readr_1.3.1       tidyr_1.1.2       tibble_3.0.3     
## [17] ggplot2_3.3.2     tidyverse_1.3.0  
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.5         lubridate_1.7.9    lattice_0.20-41    assertthat_0.2.1  
##  [5] digest_0.6.25      R6_2.4.1           cellranger_1.1.0   backports_1.1.10  
##  [9] reprex_0.3.0       evaluate_0.14      highr_0.8          httr_1.4.2        
## [13] pillar_1.4.6       rlang_0.4.10       curl_4.3           lazyeval_0.2.2    
## [17] rstudioapi_0.11    blob_1.2.1         rmarkdown_2.3      labeling_0.3      
## [21] webshot_0.5.2      htmlwidgets_1.5.3  munsell_0.5.0      broom_0.7.0       
## [25] compiler_4.0.2     modelr_0.1.8       xfun_0.17          pkgconfig_2.0.3   
## [29] htmltools_0.5.0    tidyselect_1.1.0   fansi_0.4.1        viridisLite_0.3.0 
## [33] crayon_1.3.4       dbplyr_1.4.4       withr_2.2.0        grid_4.0.2        
## [37] jsonlite_1.7.1     gtable_0.3.0       lifecycle_0.2.0    DBI_1.1.0         
## [41] magrittr_1.5       scales_1.1.1       cli_2.0.2          stringi_1.5.3     
## [45] farver_2.0.3       fs_1.5.0           xml2_1.3.2         ellipsis_0.3.1    
## [49] generics_0.0.2     vctrs_0.3.4        RColorBrewer_1.1-2 tools_4.0.2       
## [53] glue_1.4.2         crosstalk_1.1.1    hms_0.5.3          yaml_2.2.1        
## [57] colorspace_1.4-1   rvest_0.3.6        knitr_1.29         haven_2.3.1